AITopics | eluder dimension

Eluder dimension: localise it!

Neural Information Processing SystemsJun-23-2026, 07:45:54 GMT

We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.

dimension, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Eluder dimension: localise it!

Neural Information Processing SystemsJun-17-2026, 12:18:42 GMT

We establish a lower bound on the eluder dimension in generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

Neural Information Processing SystemsMay-1-2026, 01:33:10 GMT

While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than those from non-distributional approaches if the optimal cost is small. As warmup, we propose a distributional contextual bandit (DistCB) algorithm, which we show enjoys small-loss regret bounds and empirically outperforms the state-of-the-art on three real-world tasks. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. We prove that our algorithm enjoys novel small-loss PAC bounds in low-rank MDPs. As part of our analysis, we introduce the ℓ1 distributional eluder dimension which may be of independent interest. Then, in offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.

Add feedback

efb9629755e598c4f261c44aeb6fde5e-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:10:13 GMT

Add feedback

Selective Sampling and Imitation Learning via Online Regression

Neural Information Processing SystemsApr-29-2026, 21:36:39 GMT

We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education (0.67)
Transportation (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

AUnified Model and Dimension for Interactive Estimation

Neural Information Processing SystemsApr-29-2026, 19:05:04 GMT

We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity" to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which is used to derive learnability bounds in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret and PAC generalization bounds that are polynomial in the new dimension. We show that our framework subsumes and thereby unifies two classic learning models: statistical-query learning and structured bandits. We also delineate how the dissimilarity dimension is related to well-known parameters for both frameworks, in some cases yielding significantly improved analyses.

artificial intelligence, dimension, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Neural Information Processing SystemsApr-25-2026, 18:37:38 GMT

Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success. While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions. This is the focus of this work, where we study function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions). Our first result is a computationally and statistically efficient algorithm in the generative model setting under completeness for two-layer neural networks. Our second result considers this setting but under only realizability of the neural net function class.

artificial intelligence, fuzzy logic, machine learning, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.85)

Add feedback

Experiment Planning with Function Approximation

Neural Information Processing SystemsApr-25-2026, 16:03:22 GMT

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms--for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies--producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied [53], results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Experiment Planning with Function Approximation

Neural Information Processing SystemsApr-25-2026, 16:03:18 GMT

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms--for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies--producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied [53], results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

data mining, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology: